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Abstract:  A  paper  was  published  (Harsha  and  Subrahamanian  Moosath,  2014)  in  which 
the  authors  claimed  to  have  discovered  an  extension  to  Amari’s  a-gcomctry  through  a 
general  monotone  embedding  function.  It  will  be  pointed  out  here  that  this  so-called 
(F,  G')-gcomctry  (which  includes  F-gcomctry  as  a  special  case)  is  identical  to  Zhang’s 
(2004)  extension  to  the  a-gcomctry,  where  the  name  of  the  pair  of  monotone  embedding 
functions  p  and  r  were  used  instead  of  F  and  H  used  in  Harsha  and  Subrahamanian  Moosath 
(2014).  Their  weighting  function  G  for  the  Riemannian  metric  appears  cosmetically  due 
to  a  rewrite  of  the  score  function  in  log-representation  as  opposed  to  (p,  r) -representation 
in  Zhang  (2004).  It  is  further  shown  here  that  the  resulting  metric  and  o-conncctions 
obtained  by  Zhang  (2004)  through  arbitrary  monotone  embeddings  is  a  unique  extension 
of  the  a-geometric  structure.  As  a  special  case,  Naudts’  (2004)  0-logarithm  embedding 
(using  the  so-called  log^  function)  is  recovered  with  the  identification  p  =  0,  r  =  log^,  with 
0-exponential  exp0  given  by  the  associated  convex  function  linking  the  two  representations. 


Keywords:  o-embcdding;  monotone  embedding;  conjugate  embedding;  generalized 
Fisher-Rao  metric;  Amari-Chentsov  tensor;  deformed  logarithm;  representation  duality; 
(p,  r) -geometry 


In  a  recent  paper  that  appeared  in  Entropy  (Harsha  and  Subrahamanian  Moosath,  2014)  [1],  the 
authors  proposed  an  extension  to  Amari’s  a-gcomctry,  which  they  call  F-  or  (F.  G') -geometry,  where 
F  is  a  monotone  embedding  function  and  G  is  the  weighting  function  for  taking  the  expectation  of 
random  variables  in  calculating  the  Riemannian  metric  (G  =  1  reduces  to  F-geometry,  with  the  standard 
Fisher-Rao  metric).  This  paper  serves  the  purpose  of  pointing  out  that  (F,  G) -geometry  as  proposed 
is  the  same  as  what  Zhang  (2004)  [2]  has  obtained  for  extending  the  a-gcomctry  and  captured  in  his 


Entropy  2015, 17 


4486 


subsequent  work  [4-8].  The  metric  and  affine  connections  proposed  by  [1]  are  identical  to  [2]  apart  from 
the  notations:  the  embedding  functions  F  and  II  in  [  I  ]  were  denoted  as  p  and  r  in  [2],  and  weighting 
function  G  in  [1]  is  a  trivial  rewriting  of  the  convex  function  /  used  by  [2], 

This  paper  will  start  in  Section  1  with  a  review  of  Amari’s  a-gcomctry  and  a -embedding,  a  review 
of  Zhang’s  (2004)  [2]  extension  to  p-embedding  with  an  arbitrary  monotone  function  and  a  summary 
of  Harsha  and  Subrahamanian  Moosath  (2014)  [1],  Then,  the  equivalence  of  [1]  to  [2]  is  shown. 
In  Section  2,  after  analyzing  the  group  of  monotone  embedding  functions,  a  stronger  statement  is 
made:  the  construction  of  [2]  is  a  unique  dualistic  extension  of  Amari’s  a-gcomctry  through  arbitrary 
monotone  embedding  in  place  of  o-embedding.  As  an  important  special  case,  we  illustrate  how  the 
deformed  logarithm  log^  associated  with  an  arbitrary  strictly  increasing  function  0  as  investigated  by 
Naudts  (2004)  [3]  arises  naturally  from  identifying  0  with  p  and  with  a  proper  choice  of  the  auxiliary 
function  /  as  a  part  of  Zhang’s  theory. 

1.  Equivalence  of  (F,  G) -Geometry  to  Zhang’s  (2004)  [2]  (p,  r) -Geometry 


1.1.  Amari’s  a-Geometry  and  a-Embedding 


The  now  standard  differential  geometric  characterization  of  the  manifold  A4©  =  { />(•  ()).  ()  e  0  C 

Mn}  of  parametric  probability  functions  p  (probability  density  or  probability  distributions)  is  through  the 
Fisher-Rao  metric  gtj  as  its  Riemannian  metric: 


gij(d)  =  Em 


dlogp((\0)  8logp((\0)  \ 

86i  dQi  ] 


(1) 


and  a  family  of  a-connections  (given  by  Amari  [9,10])  with  coefficients  T*")  (a  €  I): 


Em 


( 1  ~  a  <9 log p(C|0)  <9 log jKCjg)  <92  logp(C|0)\  8p((\6)  \ 
\  2  86l  801  80%8Q3  )  86k  j 


(2) 


Here,  E/(  denotes  the  expectation  with  respect  to  a  background  measure  //  of  the  random  variable  denoted 

byC: 

E,{'}  =  /  (-)MO-  (3) 

The  a-connection  is  constructed  as  a  convex  combination  of  a  pair  of  conjugate  connections  T,  T* 

rgl(»)  =  +  k^r^(e),  (4) 

where  T  =  rl1!  is  frequently  called  e-connection  (a  =  1)  and  T*  =  called  m-connection  (a  =  — 1). 
A  Riemannian  manifold  Ai M  with  its  metric  g  and  the  family  of  a -connections  T(al  in  the  form  of  (1) 
and  (2)  has  been  called  a-geometry.  Amari’s  a-geometry  can  be  specified  in  terms  of  a  symmetric 
(0, 2)-tensor  g%3  (the  Fisher-Rao  metric)  and  a  totally  symmetric  (0,  3)-tensor  Ti?/,  (sometimes  called  the 
Amari-Chentsov  tensor),  which  is  linked  to  the  a-conncctions  via: 


E^q1  = 

1  ij,k  — 


f  Tijk(0)  , 


where  Tbck  is  the  Levi-Civita  connection  corresponding  to  the  Riemannian  metric  g. 


(5) 
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As  an  extension  of  the  logarithmic  embedding  l(p)  =  logp  of  probability  density  function  p,  an 
cr-embedding  function  [10]  is  defined  through  :  M+  — >■  M: 


logf  a  =  1 

1 — Ot  ' 


(6) 


It  is  an  interesting  observation  (e.g.,  p.  46  in  [11])  that  the  o-gcomctry  can  be  recovered  under  such 
^-representation  (scaling)  of  the  probability  function,  that  is  the  Fisher-Rao  metric  turns  out  to  be 
a -independent  (i.e.,  embedding  independent)  and  the  ±1 -connections  precisely  the  cr -connections: 


9ij{P) 

rgl(») 


a<°)(P(ci9)) 

dd'1  dQi 

d2l(a\p{(\6))  dl(-a\p((\6)) 
dOld63  dOk 


(7) 

(8) 


A  variance  of  o-embedding  of  a  probability  function  plays  an  important  role  in  Tsallis  statistics; 
see  [12-14].  On  the  geometric  side,  [15,16]  illuminated  that  the  o-scaling  of  the  probability  functions 
leads  to  a  conformal  transformation. 


1.2.  Zhang  (2004)  [2]  Extension:  p-Embedding  and  ([>.  r)-Geometry 

Zhang  [2,4,6]  obtained  generalizations  of  the  o-gcomctry  for  a  pair  of  monotone  embeddings,  called 
p-  and  r-embeddings  generalizing  o-embedding.  Given  any  smooth  strictly  convex  function  /  :  M.  — *  M, 
with  convex  conjugate  f*  given  by: 

nt)  =  t(fr\t)-f((fr\t)) ,  (9) 

Zhang  (2004)  defines  a  pair  of  conjugate  representations  [2]  (Section  3.2)  using  two  strictly  increasing 
functions  p,  r  from  M  — >■  M: 

(1)  we  call  ^-representation  of  a  probability  function  p  the  mapping  p  i-)-  p(p); 

(2)  we  say  r-representation  of  the  probability  function  p  i— >  r(p)  is  conjugate  to  /(-representation  with 
respect  to  a  smooth  and  strictly  convex  function  /,  or  simply  r  is  /-conjugate  to  p,  if: 

rip)  =  fw. ))  =  an'r'ipip)) ,  ao) 

which  can  be  equivalently  written  as: 

Pip)  =  if)-\rip))  =  in,irip))  ■  (ID 

These  equalities  in  (10)  and  (11)  hold,  and  they  are  equivalent,  because  f  and  (/*)'  are  both  strictly 
increasing  (due  to  their  strict  convexity)  and  that  (/*)*  =  /,  (/*)'  =  (/,)_1.  Sometimes,  we  write 
f  =  a ,  (f*)-1  =  o’- 1  for  convenience,  so  o(p)  =  r,  cr~1(r)  =  p,  for  a  strictly  increasing  function  r. 

As  a  first  example,  we  may  set  p(t)  =  t,  r(t)  =  log  t.  Then,  we  can  derive  that  /*(£)  =  exp(£)  and 
/(£)  =  t  log  t  —  t  +  1.  That  p(p)  and  r(p)  are  just  the  p  and  log p  representation  reflects  the  conventional 
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dual  embeddings  that  have  later  been  extended  to  0-  and  log^-embedding  in  ([3]).  In  Section  2.2,  it  will 
be  shown  that  Naudts’  ^-logarithm  formulation  is  recovered  as  a  special  case  of  the  (p,  r)-embedding. 

As  another  example,  we  may  set  p{jp)  =  IPHji)  to  be  the  /^-representation  given  by  Equation  (6); 
this  would  have  been  traditionally  called  “alpha-embedding”,  except  we  use  the  symbol  0,  so  that  the 
a-parameter  will  be  reserved  for  indexing  a-connections.  In  this  case,  the  conjugate  representation  is 
the  (— 0) -representation  r(p)  = 

p(p )  =  l^\p)  i — )•  r(p)  =  /(_/3)(p)  .  (12) 


In  this  case,  p  and  r  are  conjugate  with  respect  to  /,  where  /  is  given  by: 


m  = 


1+0 


1-0 


2 

1-/3 


m  = 


1-0 


1+0 


2 

1+/3 


(13) 


Based  on  divergence  functions  constructed  under  monotone  embedding,  Zhang  ([2])  showed: 

Proposition  1.  ([2],  Proposition  7)  Using  an  arbitrary  monotone  embedding  function  p  and  an 
arbitrary  smooth  strictly  convex  function  f,  a  generalization  of  a-geometry  is  obtained,  with  metric 
and  a-connections  taking  the  form: 

dp(p( Cl#))  dp(p( C|0))  j 


9ij(0)  =Eli<f  (p(p(C\0)) 


d6 1 


dei 


I 


rS(»)  =  eu 


V 


1  —  a 


f'"(p(pm))Aijk+np(pm))Bijk 


where: 


Aijk( C>  — 


dp{p(C\0))  dp{p(C\9))  dp(p(t\6)) 


09i 


861 


dek 


Bijk{  C,e)  = 


d2p(p(C\e))  dp(p(t\e)) 


de{dei 


dek 


As  special  cases, 


\dp{p)  (  ,  dp{p )  dp(p)  „f  (  d2p(p)\ 

f  Mp))  Mp))  WWi ) 


\  dek 


(14) 

(15) 

(16) 

(17) 

(18) 


r  =  e , 

Furthermore,  taking  a  pair  of  monotone  representations,  the  metric  tensor  and  affine  connections  stated 
in  Proposition  1  have  dualistic  expressions: 

Corollary  1.  ([2],  Proposition  8)  Using  two  arbitrary  monotone  embedding  functions  p  and  t,  the  metric 
and  a-connections  of(14)-(16)  are: 

f  dp(p( C|0))  dr(p( C|0))  1 


9ij{0)  =  Eu 

r'S(«)  =  e. 


\  dei  dei  j  ’ 

a  d2r(p( C,  9))  dp(p((\e))  +  1  +  a  d2p(p((\e))  dr(p(t\e))  \ 


I  • 


(19) 


(20) 


2  deidei  dek  2  deidei  dek 

As  a  special  case,  when  p,  r  take  the  familiar  alpha-embeddings  (12)  (using  0  as  the  parameter),  the 
a-connections  becomes  (a0)-connections: 

'1  ~  Q=0  dlogp(Qfl)  dlog p(C|fl)  d 2  logp(CI^)\  dp(C \e)  \ 

2  dei  dei  +  de^ei  )  dek  J  ’ 


rg(»)  =  e„ 


(21) 


with  the  product  a  •  0  playing  the  role  of  the  alpha-parameter  indexing  the  family  of  connections. 
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1.3.  Harsha  and  Subrahamanian  Moosath’s  (2014)  Work  [1  ] 


Using  a  monotone  embedding  function  denoted  as  F  and  a  weighting  function  denoted  as  G  (G  =  1 
is  a  special  case  to  reduce  to  what  they  called  F-geometry),  these  authors  [1]  proposed  (F,  G) -metric  as 


(their  Equation  (33)  in  [1]): 


F,C 

9ij 


Em 


<9  log  p  <9  log  p ) 
<90*  861  J 


(22) 


with  affine  connection  given  as  (their  Equation  (34)): 


.F,G  _  F  f  r(  ,(91ogp  /(92logp  /  pF”(p)\  dlogp d log p\\ 

ijk  m  y>  \P)  dQk  ^  QQiQQj  +  ^  +  F'(p)  )  do *  dei  )  J ' 


(23) 


Note  Ep{(-)}  =  E/;{ (•)/(}.  (23)  is  the  expression  for  the  e-connection  ( a  =  1),  Tfjk  .  To  express  the 

TT  S~1 

conjugate  connection  (m-connection,  a  =  —  1),  T  -^  ,  a  dual  embedding  function  H  is  introduced,  which 
is  shown  ([1],  Theorem  3.2)  to  be  related  to  F  and  G  via  (their  Equation  (36)): 


H\p) 


G(p )  _ 

pF'(p) 


(24) 


In  such  a  case,  the  conjugate  connection  T^'k  (sic,  more  accurately  (TF’G)*jk)  is  expressed  as  (their 
Equation  (37)): 

rH,G  TJ  /  nl  ^logp  (d2logp  (pG'(p)  pF"(p)\  dlogpdlogp\  \ 

=  E"  { +  (im  '  W )  ^diT^or )  j  -  (25) 

We  now  show  the  equivalence  of  the  three  expressions  (14),  (17),  (18)  from  the  work  [2]  with  the 
three  corresponding  expressions  (22),  (23),  (25)  from  the  work  [1], 


Statement  1.  Equations  (14)  and  (22)  give  the  same  Riemannian  metric;  Equations  (17)  and  (23)  give 
the  same  affine  connection;  and  Equations  (18)  and  (25)  give  the  same  conjugate  connection,  as  long  as: 


F(p)=p(p),  G(p)  =  (p'fpf"(p(p)). 


Proof.  Re-writing  (14),  and  keeping  in  mind: 


9  pip) 
dei 


pp'ip) 


dlogp 

dei 


*,(*>  =  E 

Comparing  the  above  with  (22),  obviously,  F  is  just  p,  and  G  is  linked  to  /  and  p: 


G{p )  =  {p'f  p  f"  (p(p))  =pp\p)r\p) 


where  we  have  used  (10). 

Next,  differentiate  (27);  we  obtain: 


d2p(p) 

d9ldPl 


Wm 


dlogp 

d9i 


+  Pp'ip) 


dp  dlogp 
d91  d9l 


+  pp'ip) 


d 2  logp 
d9ld91 


Pp'ip) 

Pp'ip) 


dlogp  dlogp 
d9l  d91 


+ 


/  <92  log  p 

V  de^dei  + 


d2  log  p 

d9ld91 


Pp"jp)\ 
p’ip)  ) 


PP"  ip)  d  log  p  d  log  p  \ 

p'ip)  <90*  del  J 

dlogp  dlogp\ 

d9l  del  J 


(26) 

(27) 

(28) 

(29) 

(30) 

(31) 

(32) 
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Identifying  F  =  p  and  making  use  of  (29),  we  see  that  (17)  is  precisely  (23). 
Finally,  differentiate  (29), 


G\p)  =  ( p'ff'W ))  +  {p'?pf'"{p{p))  +  2  p\p)p"{p)pf"{p{p)). 


(33) 


Therefore, 


pG'jp)  _  pF"(p )  _  pp'{p)f'"{p{p))  pp”{p) 

G(p)  F'(p)  f"(p(p ))  p'(p)  ' 

After  substituting  (34)  and  (29)  into  (25)  and  making  use  of  (31),  the  expression  (18)  results. 


(34) 

□ 


Statement  2.  The  conjugate  embedding  function  H  is  the  same  as  r.  The  conjugate  connection  (25), 
when  expressed  using  H,  has  the  same  form  as  (23)  for  T  using  F. 


Proof.  Applying  Definition  (24)  immediately  yields  Hl  =  r' .  Therefore,  (apart  from  constant)  II  (p)  = 
r(p).  Next,  we  will  express  (25)  explicitly  using  the  conjugate  embedding  function  H  (rather  than  F) 
and  the  weighting  function  G.  That  is  to  say,  we  will  simplify  the  terms  in  the  middle  parenthesis  of  (25): 


pG'(p)  pF"(p ) 
G(p)  F'(p) 


P 

P 


(log  =  p^°^PH'^P))' 

(1  H"(p)\  pH" 

\p  77'  )  H'(p)  ' 


p  (logp  +  log  H'(pj)' 


(35) 

(36) 


Hence,  (25)  has  the  same  expression  as  (23)  showing  the  duality  between  the  embedding  function  H  and 
the  embedding  function  F.  □ 


By  Statement  1,  starting  from  F  (that  is,  p)  and  G  and  imposing  conjugacy  requirement  on  the  pair 
of  affine  connections,  one  is  guaranteed  to  derive  H  (that  is,  r)  as  the  conjugate  embedding  function. 

From  Statements  1  and  2,  we  conclude  that,  Harsha  and  Moosath’s  F-embedding  [1]  replicates  the 
p-embedding  of  Zhang  (2004)  [2];  the  conjugate  77-embedding  turns  out  to  be  identical  to  r-embedding 
of  [2].  Contrary  to  the  authors’  claim  (Remark  3.7  of  [1],  p.  2480),  (F,  G) -geometry  is  identical  to 
Zhang’s  (p,  r)  geometry  [2].  In  particular,  their  F-geometry  is  recovered  by  simply  choosing  /  to  satisfy 
f"(t)  =  l/(p_1(t)  (p'(p-1(t)))2),  for  a  given  p.  The  subsequent  development  in  their  paper  [1],  e.g.,  the 
definition  of  the  F-affine  manifold  (their  Equation  (50)),  replicates  the  definition  of  p-affine  manifold 
in  [2]  (Section  3.4). 

During  the  review  of  their  manuscript  [1]  and  in  subsequent  personal  communications,  these  authors 
argued  that  they  used  a  different  approach:  (F,  G) -geometry  is  derived  by  embedding  the  manifold  into 
the  space  of  random  variables  and  suitably  defining  the  inner  product  through  using  the  F-expectation 
(their  Equation  (15))  and  (F,  G) -expectation  (their  Equation  (32))  as  a  general  weighted  expectation 
of  a  random  variable,  while  Zhang  (2004)  [2]  derived  the  geometry  through  constructing  a  divergence 
function.  This  difference,  however,  is  entirely  superficial,  because  the  relationship  between  divergence 
functions  and  geometric  structure  (metric  and  affine  connection)  is  well-established  by  Eguchi’s 
work  [17,18]  and  known  to  information  geometers.  Therefore,  neither  the  approach  nor  the  results 
of  Harsha  and  Moosath’s  proposed  (F,  77,  G)  extension  to  Amari’s  a-gcomctry  differs  from  Zhang’s 
proposed  (p,  r,  /)  extension,  with  the  following  correspondence  in  different  symbols  by  the  two  papers: 


1 


F 


p,  77 


(37) 


Entropy  2015, 17 


4491 


G(t)  «  tp\t)r\t)  =  f/"(p(t))(p'(f))2  =  t(D"(r(t))(r'(t))2  ; 


(38) 


the  difference  in  the  representation  of  score  function  as  log-representation  in  [1]  or  under  p  or 
r-representation  in  [2]  is  cosmetic. 

2.  Uniqueness  of  (p,  r) -Geometry  and  Representation  Duality 

2.1.  Monotone  Embedding  as  a  Transformation  Group 

Monotone  representations  of  any  given  probability  function  form  a  transformation  group,  with 
functional  composition  as  group  composition  operation  and  the  functional  inverse  as  the  group  inverse 
operation.  This  was  pointed  out  by  Zhang  [6]  (Section  2.2.2).  We  state  it  as  a  lemma  here. 

Lemma  1.  Denote  Cl  as  the  set  of  strictly  increasing  functions  from  M  — y  M.  Then,  (Q,  o)  forms  a  group, 
with  o  denoting  functional  composition. 

Proof.  We  easily  verify  that: 

(1)  closure  for  o:  for  any  pi,  p2  G  fl,  p2  o  Pi,  defined  as  p2(pi(-)),  is  strictly  increasing,  and  hence, 


Pi  °  Pi  g  Cl; 


(2)  existence  of  unique  identity  element:  the  identity  function  l,  which  satisfies  (>  o  /  =  i  op  —  p,  is 
strictly  increasing,  and  hence,  i  G  0  and  is  unique; 

(3)  existence  of  inverse:  for  any  p  G  0,  its  functional  inverse  p_1,  which  satisfies  p_1  op  =  p'1  op  =  t, 
is  also  strictly  increasing,  and  hence,  p~l  G  Cl; 

(4)  associativity  of  o:  for  any  three  pi,  p2,  p3  G  Cl,  then  (pi  o  p2)  o  p3  =  pl  o  (p2  o  p3). 

□ 

Recall  that  the  derivative  of  smooth  strictly  convex  functions  are  strictly  increasing  functions.  From 
this  perspective,  f  —  t  o  p~l  =  r(p_1(-)),  (/*)'  =  p  o  r_1  =  p(r_1(-)),  encountered  above,  are 
themselves  two  mutually  inverse  strictly  increasing  functions.  This  is  the  rationale  behind  Zhang’s  ([2]) 
choice  of  /  (and  /*)  as  the  auxiliary  function  to  capture  conjugate  embedding,  rather  than  using  G  as 
in  [1].  The  following  identities  are  useful;  they  are  obtained  by  differentiating  (10)  and  (1 1): 


f"{p{t))p'{t)  =  r\t),  (/T(r(f))r'(f)=p'(f); 


(39) 


therefore: 


f"(p(t))  (P'(t))2  =  ( /* )"(r(t ))  (r'(t))2  , 


(40) 


and: 


f"{p{t))  (/*)>(*))  =  I- 

With  respect  to  (41),  taking  log  on  both  sides  yields: 


(41) 


log/"(p(f))  +  log(/T(^))=0. 


(42) 
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Move  and  differentiate: 

p'{t) 

Making  use  of  (40)  yields: 

f"\p{t ))  (//(f))3 


(■ n"\r{t))r'{t ) 

(/*)"(T(f)) 

HfTirit))  (r'(f))3  • 


(43) 


(44) 


Note  the  coupling  between  /  and  p,  r  given  by  (10),  (11),  (40)  and  (44).  They  allow  us  to  cast  (14)  and 
(15)  in  terms  of  f*  and  r. 

Among  the  triple  (/,  p,  r),  given  any  two,  the  third  is  specified.  In  particular,  if  we  arbitrary  choose 
two  strictly  increasing  functions  p  and  r  as  embedding  functions  and  require  them  to  be  conjugate 
embeddings,  then  /  is  specified  by  /'(f)  =  r(p_1(f)).  In  terms  of  conjugate  function  /*,  the  relation  is 
(  f  *y(t)  =  p(r_1(f)).  The  function  /  (or  /*)  is  important  in  constructing  the  general  class  of  divergence 
function. 


2.2.  Naudts’  0- Logarithm  as  a  Special  Case 

In  his  2004  publication  [3],  Naudts  considered  the  “deformed”  logarithm  function  as  an  extension 
to  the  exponential  family  of  densities  that  is  log-linear.  Given  a  strictly  increasing  and  strictly  positive 
function  0 :  M+  — »  M+,  the  0-logarithm  is  defined  as: 

1 

log 4>(t)  =  j  (t>0)- 

The  deformed  exponential  denoted  exp^,,  is  defined  by: 


(45) 


exp^t)  =  1  +  /  ijj(s)  ds. 


(46) 


(Naudts  (2004)  used  the  notation  exp,a,  so  our  current  rendition  has  a  subtle  difference  shown  as  (48) 
and  (49)  below.)  It  can  be  shown  that  the  deformed  functions  log^  and  exp,a  are  in  fact  inverse  functions 
of  each  other  if: 

V’(log^))  =  -0(f)  =  0(exp  (47) 

Stated  alternatively,  the  deformed  logarithmic  function  h(t)  =  log^(f)  can  be  viewed  as  the  solution  to 
the  following  integral  and  its  equivalent  differential  equation: 

rt  1  dh  1 


h(t)  = 


-ds 


1 1  ijj(h(s))  dt  ^(^(f))’ 


(48) 


whereas  the  deformed  exponential  function  h(t)  =  exp,a(f )  can  be  viewed  as  the  solution  to  the  following 
integral  and  its  equivalent  differential  equation: 


h(t)  =  1  +  /  ( i>(h(s))ds 


(49) 


We  now  show  that  the  above  formulation  can  be  re-written  as  (p,  r) -embeddings  with  a  particular 
choice  of  /  (or  equivalently,  /*)  function.  Set  0(f)  =  p(t)  and  f*(t)  =  exp^(f),  so  that  ( )  =  0(f) 
from  (46).  Therefore,  we  derive: 


!°g  4>(f)  =  ^  =  ((DO  (P^))  =  f(p(t))  =  T(t) 
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That  is,  when  0  is  chosen  as  ^-representation,  the  deformed  logarithm  log^  turns  out  to  be  the 
r-representation,  while  the  deformed  exponential  is  nothing  but  /*.  The  relationship  (47)  is  identical 
to  (10)  and  (11). 

In  the  ^-logarithm  approach,  once  0  (that  is,  p )  is  specified,  then  log0  (that  is,  r)  is  specified,  through 
the  integral  relation  (45).  Viewing  r(-)  =  /'(/?(■)),  the  relation  (45)  essentially  specifies  a  strictly  convex 
function  /,  through  its  derivative  /',  which  operates  on  p. 


Proposition  2.  Denote  p  =  0.  The  deformed  logarithmic  transformation  0  — s-  log0  given  by  (45)  can  be 
viewed  as  the  function  composition  f  \  p  —y  f'(p),  where  f  is  given  by: 


f(p(t))  =  p{t)f'(p(t))  -t. 

(50) 

Equivalently,  using  conjugate  function  f*  given  by  (9), 

P  =  (/*)'  o  (/T\ 

(51) 

or 

1 

(52) 

Proof.  From  (45),  we  write: 

/  (P(())  “  l  p(s) 

(53) 

with  unknown  /.  Multiply  both  sides  by  p'(t)  and  then  integrate  from  one  to  x;  the  left-hand  side  of  (53) 


f'(p(t))  p'(t) di  =  fi  /'(/>(*) MpW)  =  f(p(x))  -  /(p(i))- 

The  right-hand  side  of  (53),  after  the  same  operation,  is: 

r  P\t)  dt  f  ^—ds=  [ 1  ds  r  p\t)  dt  =  r p ^  ~p^  dS 

J 1  J 1  p{s)  J 1  p{s)  Js  J\  p(S ) 

= h  - 1) ds = (/  W)d*)  -  L ds = p^^p^  -  ^  ^ 

Clearly,  /'(p(l))  =  0  by  (53).  We  set  /(p(l))  =  —1.  Comparing  expressions  from  the  left-  and 
right-hand  side,  we  obtain  (50). 

Applying  (9),  we  obtain  the  equivalent  expression: 

rv'(pm  =  t. 

That  is,  /  is  chosen,  such  that  f*  o  f  is  the  inverse  function  of  p,  or: 

p = (r  o  /r1  =  (/r1  o  (r)-1  =  cry  o  ar1- 

Hence,  (51)  holds. 

Finally,  differentiate  the  identity: 

n(rr\t))  =  f 

we  obtain: 

i  =  (m/T1(0)-(/T1W  =  pW-(/T1W 

upon  substituting  (51).  Hence,  (52)  holds.  □ 
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The  expression  (5 1)  in  Proposition  2  shows  that  for  any  p ,  if  one  can  find  a  decomposition:  p  =  g'og-1 
in  terms  of  g,  then  g  would  be  the  ^-exponential,  g~l  the  p- logarithm  and  g'  the  linking  function.  In  the 
case  of  cf)  i — y  log^  transformation,  g  =  f*(t). 

Naudts’  ([3])  deformed  logarithm/exponential  embedding  approach  and  Zhang’s  ([2]) 
(p,  r)-embedding  approach  can  be  seen  as  playing  complementary  roles  in  information  geometry: 
the  former  makes  it  easy  to  generalize  the  exponentiation  and  logarithm  as  inverse  operations 
obeying  desired  differential/integral  equations,  while  the  latter  makes  it  apparent  how  conjugate 
(p,  r) -embeddings  lead  to  bidualistic  expressions  for  the  underlying  geometric  structures  (metric  and 
conjugate  connections). 

2.3.  Uniqueness  o/(p,  t) -Geometry 

It  is  known  [19,20]  that  the  Fisher-Rao  metric  and  o-conncctions  (equivalently,  Amari-Chentsov 
tensor  T )  are  the  only  invariants  of  sufficient  statistics  under  the  Markov  morphism  of  a  random  variable. 
In  [22,23],  the  Fisher-Rao  metric  has  been  extended  to  allow  a  weighting  function.  In  [2,6],  general 
weighting  functions  for  affine  connections  were  made  compatible  with  the  generalized  (i.e.,  weighted) 
Fisher-Rao  metric,  since  they  result  from  divergence  functions  that  are  allowed  to  have  the  freedom  of 
monotone  embedding.  The  recent  reinvention  [1]  constructed  weighted  connections  that  turned  out  to  be 
identical  to  the  expressions  given  by  [2].  A  natural  question  is,  then,  whether  Zhang’s  (p,  r)  geometry  is 
the  unique  construction  given  the  freedom  of  arbitrary  monotone  embedding.  Below,  arguments  will  be 
provided,  along  with  a  proof,  for  a  positive  answer  to  this  question. 

First,  when  a  probability  function  p((\9)  (as  a  function  of  a  random  variable  indexed  by  (  and  a 
background  measure  of  p)  is  embedded  into  the  parametric  manifold  AT©,  there  are  several  traditional 
choices  for  tangent  vectors:  c/p,  d,  log/;,  d, yfp,  etc.  Each  of  these  are  linked  with  a  weighting  function 
(expectation  operator),  so  that  the  tangent  vectors  are  zero-mean  random  variables: 

0  =  E ^{dip}  =  E M{(p)  d,  log p}  =  E^KVp)  df^/p)}  =  •  •  •  (54) 

where  the  weighting  functions  are,  respectively,  one,  p,  v/p: 

0  =  E/t{c/p}  =  Ep{<9j  logp}  =  E  ^{diiy/p)}  =  ■■■ 

For  these  various  choices,  the  direction  of  the  tangent  vectors  are  all  the  same.  We  can  consider  the  above 
as  special  cases  of  p-embedding,  with  p(t)  =  t,  log  t,  \/t,  respectively.  Because  c/(p(p))  =  p'(p)c/p,  so 
a  tangent  vector  retains  its  direction  with  any  choice  of  monotone  embedding  function. 

To  investigate  the  weighting  function  for  general  monotone  p-embedding,  let  us  consider  the 
/-normalization  (foliation)  condition,  cf.  [21], 

E/,{/(p(p)}  =  1,  (55) 

where  /  is  a  given  convex  function.  Differentiate  the  above;  we  get: 

0  =  Em  {mO)^}  =  E M{r(p)  8^}.  (56) 
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Therefore,  we  can  see  that  r(p)  =  f'(p{p)),  what  we  have  called  the  /-conjugate  of  p,  is  precisely  the 
weighting  function  to  make  c/p  a  zero-mean  random  function  at  any  point  of  .Me  O', e. ,  for  any  value  of 

e  g  ©). 

Next,  consider  the  Fisher-Rao  metric  (1),  which  can  be  written  as  E^-fc/pc/ logp}  = 
E/;  { 8,  log  pdjp},  the  pairing  of  a  random  function  with  a  random  functional  under  two  embeddings  p 
and  logp.  A  natural  generalization  (see  [6])  is  to  use  two  (independently  chosen)  monotone  embeddings 


9ij(0)  =  Em{ dipdjr}  =  E ^{djpdir}  =  E M{p'(p)  r'(p)  dpdjp}  . 


(57) 


This  is  precisely  (14),  with  the  weighting  function  for  the  Riemannian  metric  as  ///(p(p))(p/(p))2 


r'(p)p/(p),  when  tangent  vectors  are  expressed  as  c/p  (identity  representation).  When  p-representation 
or  r-representation  is  adopted,  the  weighting  function  is  simply  /"(p(p))  or  (/*)//(r(p)),  respectively. 

Third,  given  p,  r  embedding,  we  can  construct  two  affine  connections  on  the  manifold  as  follows. 
Differentiate  (57), 


(58) 


and  compare  with  the  relation  that  defines  conjugate  connections: 


(59) 


we  can  identify: 


(60) 


with  T kij  and: 


(61) 


with  T*kji,  respectively.  Their  difference  is,  by  definition,  the  Amari-Chentsov  (0,3)-tensor  T: 


(62) 


Proposition  3.  T  as  given  by  (62)  is  a  totally  symmetric  (0,3)-tensor. 


Proof.  First,  we  prove  that  T(Q)  is  totally  symmetric: 


(63) 


Since  (62)  clearly  implies  Tl]k  =  TJlk,  we  only  need  to  establish  Tvjk  =  Tlkj .  Applying  the  chain-rule  of 
differentiation, 


d6i  \  dd]  d9k 


d  /  dr(p)  dp(p) 

dO1  \  891  Q9k 

8  f  8p(p)  8r(p) 


82t(p)  8 pip)  82p(p)  8r(p) 

89*891  89k  +  89*89k  891  ’ 
82p(p)  dr(p)  82t(p)  8p(p) 
89*891  89k  +  89*89k  891  ’ 


(64) 


(65) 


and  taking  into  account: 


891  d9k 


89k  891 


(66) 
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(62)  becomes: 


Tijk{0 )  = 


( Q2pjp)  9t(p)  d2r(p)  MpY\  1  =  T  (B) 

\deidek  dei  de'dek  dei  )  j  ikj[  J 


(67) 


Next,  we  prove  that  Tljk  is  indeed  a  (0,3)-tensor.  This  is  done  through  examining  the  behavior  of  T 
under  a  coordinate  transform  0^-9,  with  the  (inverse)  Jacobian  matrix  which  affects: 


and: 


dp{p)  dp{p)  39l 

dei  =  ^  do1  W  ’ 


dr{p) 

ddi 


E 


dr(p)  d9l 
dOl  W  ’ 


d2p(p) 

d9ld& 

d2r(p) 

d0ld01 


E 


d2p(p)  d9l  d9m 

deldemW~d¥ 


E 


d2r(p)  d9l  d9m 

d9ldOmWiW 


E 

i 

E 


dp(p)  d29l 
d9l  d&dOi 

dr(p)  d29l 
d9l  d9ld61 


Therefore: 


T  (ff\  =  F  f  d2r(P )  dp (P)  ^ppp)  gr(P)  1  _  ^  ^  (a\ 

l3kX  }  ~  31  \  dOW  d6k  d&dQi  ddk  J  ^  d6l  d6m  d6n  lmn[  ’ 

v  Imn 


(68) 

(69) 

(70) 

(71) 


after  substituting  (69),  (70)  and  (62).  T  indeed  transforms  to  T  in  a  manner  that  defines  a  (0,  3)-tensor. 
Therefore,  the  proposition  is  proven.  □ 


We  now  cast  the  Amari-Chentsov  tensor  T  in  an  alternative  form  that  gives  an  explicit  form  of 
weighting  function.  Given  p,  r,  because  of  Lemma  1,  there  exists  another  monotone  embedding  a, 
such  that  o(p)  =  r.  Differentiating, 


Differentiate  again,  we  obtain: 


d°(p(p)) 

dei 


^(pip)) 


9  pip) 
dei 


(72) 


dM  pjp)) 

dei  dei 


°"{p{p)) 


dpjp)  dpjp) 

dei  del 


+  o'(p(p)) 


d2p(p) 

deidei 


(73) 


Substituting  the  above  into  (62),  we  obtain  an  expression  of  T  in  terms  of  p  (which  plays  the  role  of 
embedding  function)  and  a  (which  plays  the  role  of  weighting  function): 


Tijk(0)  =  Em 


|  o"{pip)) 


dpjp )  dpjp)  dpjp)  \ 
d9i  QQj  QQk  J 


Similarly,  we  can  obtain: 


Tijk(e )  =  -em 


(0"Mp)) 


dr{p)  dr  [p)  dr(p)  j 
d9l  del  QQk  j 


(74) 


(75) 


Therefore,  under  r-representation,  o  1  (the  inverse  function  of  <j)  serves  as  the  weighting  function.  Note 
that  a  =  /',  a  1  =  (/*)'  when  p  and  r  are  said  to  be  conjugate.  Furthermore,  note  the  negative  sign  in 


(75)  compared  with  (74);  this  precisely  reflects  “representation  duality”  with  a  p  < — y  r  exchange. 
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To  summarize,  because  a-gcomctry  {M..  g.  T }  is  uniquely  specified  given  a  Riemannian  metric  g 
and  the  Amari-Chentsov  tensor  T,  the  above  derivations  show  that  they  both  enjoy  the  freedom  of  two 
monotone/convex  functions,  with  the  freedom  in  specifying  g  coupled  to  the  freedom  in  specifying  T  in 
the  same  way  that  the  metric  and  connections  are  coupled  via  Codazzi  relation  for  statistical  manifolds. 
That  the  weighting  functions  used  to  construct  linear,  symmetric  bilinear  and  totally  symmetric  trilinear 
functionals  (on  random  functions)  turns  out  to  be  /'(/?(•)),  /"(p(-)),  f'"(p(-)),  respectively,  is  noteworthy. 
See  [6]  for  more  discussions. 

2.4.  Representation  Duality  versus  Reference  Duality 

Going  beyond  extending  o-cm bedding  to  dual  monotonic  embeddings,  Reference  [2]  illuminated  two 
different  senses  of  duality  in  the  o-gcomctry.  Prior  to  [2],  there  have  been  several  different  usages  of 
Q-pai'amctcr  in  Amari’s  theory  of  information  geometry  [10,11]: 

(1)  parameterizing  the  divergence  functions  (o-divcrgcnccs); 

(2)  parameterizing  monotone  embedding  of  probability  functions  (a: -embedding); 

(3)  parameterizing  the  convex  mixture  of  connections  (o-conncctions). 

Zhang  (2004)  [2]  showed  that  (1)  and  (2)  reflect  two  different  types  of  duality  in  information  geometry, 
with  (1)  concerning  the  reference/comparison  status  of  a  pair  of  points  (functions)  expressed  in 
the  divergence  function  (“reference  duality”)  and  (2)  concerning  their  representation  under  arbitrary 
monotone  scaling  (“representation  duality”).  Both  can  lead  to  (3),  the  family  of  o-conncctions. 
Therefore,  care  has  to  be  taken  in  carefully  delineating  these  two  kinds  of  duality;  for  instance,  the 
a:/3 -connection  we  derived  in  (21)  reflects  how  reference  duality  and  representation  duality  interacts  in 
the  alpha-connections. 

The  present  analysis  elaborated  representation  duality  in  information  geometry  by  working  out 
the  freedom  in  allowing  two  (independently  chosen)  embedding  functions  p,  r  or,  equivalently,  one 
embedding  function  p  along  with  a  weighting  function  /,  while  the  (p,  /)  pair  can  be  dually  chosen  to  be 
the  (r,  /*)  pair.  Naudts’  (2004)  [3]  ^-logarithm  is  but  a  special  case  of  the  (p,  r)  duality,  in  which  f  plays 
the  role  of  the  “integral-of-the-reciprocal”  operation,  that  is  taking  the  log  of  a  function.  This  linkage  then 
leads  to  f*  and  r  as  inverse  functions.  The  phenomena  of  biduality  emerges  when  exchanging  p  < — >  r 
or  (p,  /)  i — »  (r,  /*)  leads  to  invariance  of  the  Riemannian  metric,  but  switches  the  two  connections  (the 
latter  half  of  the  statement  is  equivalent  to  changing  signs  of  the  Amari-Chentsov  tensor).  Therefore, 
the  present  paper,  while  elaborating  the  theory  developed  in  [2],  re-asserts  the  distinction  between  two 
distinct  kinds  of  duality  that  was  originally  confounded  in  Amari’s  theory  of  a-gcomctry,  one  through 
the  freedom  of  selecting  monotone  embedding  functions  (“representation  duality”)  and  the  other  through 
the  freedom  of  assigning  referential  status  to  points  for  pair  comparison  (“reference  duality”). 

Finally,  it  is  noted  that  the  (bi)dualistic  structure  of  the  (p,  r) -geometry  (generalizing  a-gcomctry) 
is  preserved  in  the  non-parametric  (infinite-dimensional)  setting,  as  well  [4,6],  with  the  o-conncction 
structure  cast  in  a  more  general  way.  Theorem  1  of  [4]  gives  non-parametric  expressions  of  the  metric 
and  connections  under  monotone  embedding,  mirroring  the  forms  (14)  and  (15)  in  the  parametric  case. 
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3.  Conclusion 

The  Riemannian  metric  with  the  pair  of  conjugate  connections  derived  by  Harsha  and  Moosath  [1]  are 
identical  to  the  (p,  r)-geometry  obtained  by  Zhang  in  [2].  The  (p,  rj-embedding  also  recovers  Naudts’ 
deformed  logarithm/exponential  formulation.  It  is  further  shown  in  this  paper  that  such  (p,  r) -geometry 
obtained  is,  when  a -embedding  is  relaxed  to  arbitrary  monotone  embeddings,  the  unique  extension  of 
Amari’s  a-gcomctry  in  terms  of  its  representational  freedom. 
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